AITopics | batch statistics

Collaborating Authors

batch statistics

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Buffer layers for Test-Time Adaptation

Kim, Hyeongyu, Han, Geonhui, Hwang, Dosik

arXiv.org Artificial IntelligenceOct-31-2025

In recent advancements in Test Time Adaptation (TTA), most existing methodologies focus on updating normalization layers to adapt to the test domain. However, the reliance on normalization-based adaptation presents key challenges. First, normalization layers such as Batch Normalization (BN) are highly sensitive to small batch sizes, leading to unstable and inaccurate statistics. Moreover, normalization-based adaptation is inherently constrained by the structure of the pre-trained model, as it relies on training-time statistics that may not generalize well to unseen domains. These issues limit the effectiveness of normalization-based TTA approaches, especially under significant domain shift. In this paper, we introduce a novel paradigm based on the concept of a Buffer layer, which addresses the fundamental limitations of normalization layer updates. Unlike existing methods that modify the core parameters of the model, our approach preserves the integrity of the pre-trained backbone, inherently mitigating the risk of catastrophic forgetting during online adaptation. Through comprehensive experimentation, we demonstrate that our approach not only outperforms traditional methods in mitigating domain shift and enhancing model robustness, but also exhibits strong resilience to forgetting. Furthermore, our Buffer layer is modular and can be seamlessly integrated into nearly all existing TTA frameworks, resulting in consistent performance improvements across various architectures. These findings validate the effectiveness and versatility of the proposed solution in real-world domain adaptation scenarios. The code is available at https://github.com/hyeongyu-kim/Buffer_TTA.

adaptation, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.21271

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

AdverX-Ray: Ensuring X-Ray Integrity Through Frequency-Sensitive Adversarial VAEs

Caetano, Francisco, Viviers, Christiaan, Filatova, Lena, de With, Peter H. N., van der Sommen, Fons

arXiv.org Artificial IntelligenceFeb-23-2025

Ensuring the quality and integrity of medical images is crucial for maintaining diagnostic accuracy in deep learning-based Computer-Aided Diagnosis and Computer-Aided Detection (CAD) systems. Covariate shifts are subtle variations in the data distribution caused by different imaging devices or settings and can severely degrade model performance, similar to the effects of adversarial attacks. Therefore, it is vital to have a lightweight and fast method to assess the quality of these images prior to using CAD models. AdverX-Ray addresses this need by serving as an image-quality assessment layer, designed to detect covariate shifts effectively. This Adversarial Variational Autoencoder prioritizes the discriminator's role, using the suboptimal outputs of the generator as negative samples to fine-tune the discriminator's ability to identify high-frequency artifacts. Images generated by adversarial networks often exhibit severe high-frequency artifacts, guiding the discriminator to focus excessively on these components. This makes the discriminator ideal for this approach. Trained on patches from X-ray images of specific machine models, AdverX-Ray can evaluate whether a scan matches the training distribution, or if a scan from the same machine is captured under different settings. Extensive comparisons with various OOD detection methods show that AdverX-Ray significantly outperforms existing techniques, achieving a 96.2% average AUROC using only 64 random patches from an X-ray. Its lightweight and fast architecture makes it suitable for real-time applications, enhancing the reliability of medical imaging systems. The code and pretrained models are publicly available.

adverx-ray, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.1661

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
North America > United States (0.04)
Europe > Spain > Valencian Community (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Review for NeurIPS paper: Stochastic Normalization

Neural Information Processing SystemsFeb-5-2025, 06:19:47 GMT

Though the reviewers remark that the paper brings no insights/analysis, it was well-received by reviewers on average as an empirical architecture design idea, addressing an important problem. The experimental validation is conducted to the standards in the field and shows that the method is empirically useful. The combination BSS StochNorm is particularly promising. The authors are invited to submit the final version, considering the following improvements: - the paper can be densified to avoid self-repetitions and redundancy (of definitions of normalizations, descriptions of the contribution -- something like trice, of the existing methods, the algorithm and its description and Fig 1) - this space and the 9th page could be used to clarify important details of the experimental setup that are needed to understand what is the basis of comparison: how the hyperparameters are chosen per method, whether the 5 trials include a random train-validation splitting; include additional results from the rebuttal and discuss more along along the points below relating to the literature. As pointed out by reviewers, using moving averages is princily different from using batch statistics in that the moving average is considered as a constant for back-propagation.

batch statistics, normalization, stochastic normalization, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DisCoPatch: Batch Statistics Are All You Need For OOD Detection, But Only If You Can Trust Them

Caetano, Francisco, Viviers, Christiaan, Zavala-Mondragón, Luis A., de With, Peter H. N., van der Sommen, Fons

arXiv.org Artificial IntelligenceJan-14-2025

Out-of-distribution (OOD) detection holds significant importance across many applications. While semantic and domain-shift OOD problems are well-studied, this work focuses on covariate shifts - subtle variations in the data distribution that can degrade machine learning performance. We hypothesize that detecting these subtle shifts can improve our understanding of in-distribution boundaries, ultimately improving OOD detection. In adversarial discriminators trained with Batch Normalization (BN), real and adversarial samples form distinct domains with unique batch statistics - a property we exploit for OOD detection. We introduce DisCoPatch, an unsupervised Adversarial Variational Autoencoder (VAE) framework that harnesses this mechanism. During inference, batches consist of patches from the same image, ensuring a consistent data distribution that allows the model to rely on batch statistics. DisCoPatch uses the VAE's suboptimal outputs (generated and reconstructed) as negative samples to train the discriminator, thereby improving its ability to delineate the boundary between in-distribution samples and covariate shifts. By tightening this boundary, DisCoPatch achieves state-of-the-art results in public OOD detection benchmarks. The proposed model not only excels in detecting covariate shifts, achieving 95.5% AUROC on ImageNet-1K(-C) but also outperforms all prior methods on public Near-OOD (95.0%) benchmarks. With a compact model size of 25MB, it achieves high OOD detection performance at notably lower latency than existing methods, making it an efficient and practical solution for real-world OOD detection applications. The code will be made publicly available

artificial intelligence, detection, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.08005

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Leveraging Normalization Layer in Adapters With Progressive Learning and Adaptive Distillation for Cross-Domain Few-Shot Learning

Yang, Yongjin, Kim, Taehyeon, Yun, Se-Young

arXiv.org Artificial IntelligenceDec-18-2023

Cross-domain few-shot learning presents a formidable challenge, as models must be trained on base classes and then tested on novel classes from various domains with only a few samples at hand. While prior approaches have primarily focused on parameter-efficient methods of using adapters, they often overlook two critical issues: shifts in batch statistics and noisy sample statistics arising from domain discrepancy variations. In this paper, we introduce a novel generic framework that leverages normalization layer in adapters with Progressive Learning and Adaptive Distillation (ProLAD), marking two principal contributions. First, our methodology utilizes two separate adapters: one devoid of a normalization layer, which is more effective for similar domains, and another embedded with a normalization layer, designed to leverage the batch statistics of the target domain, thus proving effective for dissimilar domains. Second, to address the pitfalls of noisy statistics, we deploy two strategies: a progressive training of the two adapters and an adaptive distillation technique derived from features determined by the model solely with the adapter devoid of a normalization layer. Through this adaptive distillation, our approach functions as a modulator, controlling the primary adapter for adaptation, based on each domain. Evaluations on standard cross-domain few-shot learning benchmarks confirm that our technique outperforms existing state-of-the-art methodologies.

arXiv.org Artificial Intelligence

2312.1126

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Practical Batch Bayesian Sampling Algorithms for Online Adaptive Traffic Experimentation

Zhang, Zezhong, Yuan, Ted

arXiv.org Artificial IntelligenceSep-17-2023

Online controlled experiments have emerged as industry gold standard for assessing new web features. As new web algorithms proliferate, experimentation platform faces an increasing demand on the velocity of online experiments, which encourages adaptive traffic testing methods to speed up identifying best variant by efficiently allocating traffic. This paper proposed four Bayesian batch bandit algorithms (NB-TS, WB-TS, NB-TTTS, WB-TTTS) for eBay's experimentation platform, using summary batch statistics of a goal metric without incurring new engineering technical debts. The novel WB-TTTS, in particular, demonstrates as an efficient, trustworthy and robust alternative to fixed horizon A/B testing. Another novel contribution is to bring trustworthiness of best arm identification algorithms into evaluation criterion and highlight the existence of severe false positive inflation with equivalent best arms. To gain the trust of experimenters, experimentation platform must consider both efficiency and trustworthiness; However, to the best of authors' knowledge, trustworthiness as an important topic is rarely discussed. This paper shows that Bayesian bandits without neutral posterior reshaping, particularly naive Thompson sampling (NB-TS), are untrustworthy because they can always identify an arm as the best from equivalent best arms. To restore trustworthiness, a novel finding uncovers connections between convergence distribution of posterior optimal probabilities of equivalent best arms and neutral posterior reshaping, which controls false positives. Lastly, this paper presents lessons learned from eBay's experience, as well as thorough evaluations. We hope this work is useful to other industrial practitioners and inspires academic researchers interested in the trustworthiness of adaptive traffic experimentation.

batch, best arm, experiment, (15 more...)

arXiv.org Artificial Intelligence

2305.14704

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > United States > Maryland > Baltimore (0.04)
(6 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Services (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Add feedback

Revisiting adapters with adversarial training

Rebuffi, Sylvestre-Alvise, Croce, Francesco, Gowal, Sven

arXiv.org Artificial IntelligenceOct-10-2022

While adversarial training is generally used as a defense mechanism, recent works show that it can also act as a regularizer. By co-training a neural network on clean and adversarial inputs, it is possible to improve classification accuracy on the clean, non-adversarial inputs. We demonstrate that, contrary to previous findings, it is not necessary to separate batch statistics when co-training on clean and adversarial inputs, and that it is sufficient to use adapters with few domain-specific parameters for each type of input. We establish that using the classification token of a Vision Transformer (ViT) as an adapter is enough to match the classification performance of dual normalization layers, while using significantly less additional parameters. First, we improve upon the top-1 accuracy of a non-adversarially trained ViT-B16 model by +1.12% on ImageNet (reaching 83.76% top-1 accuracy). Second, and more importantly, we show that training with adapters enables model soups through linear combinations of the clean and adversarial tokens. These model soups, which we call adversarial model soups, allow us to trade-off between clean and robust accuracy without sacrificing efficiency. Finally, we show that we can easily adapt the resulting models in the face of distribution shifts. Our ViT-B16 obtains top-1 accuracies on ImageNet variants that are on average +4.00% better than those obtained with Masked Autoencoders.

accuracy, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.04886

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

BYOL tutorial: self-supervised learning on CIFAR images with code in Pytorch

#artificialintelligenceMay-12-2022, 19:30:54 GMT

After presenting SimCLR, a contrastive self-supervised learning framework, I decided to demonstrate another infamous method, called BYOL. Bootstrap Your Own Latent (BYOL), is a new algorithm for self-supervised learning of image representations. It does not explicitly use negative samples. Negative samples are images from the batch other than the positive pair. As a result, BYOL is claimed to require smaller batch sizes, which makes it an attractive choice.

accuracy, byol, self-supervised learning, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Test-time Batch Statistics Calibration for Covariate Shift

You, Fuming, Li, Jingjing, Zhao, Zhou

arXiv.org Artificial IntelligenceOct-6-2021

Deep neural networks have a clear degradation when applying to the unseen environment due to the covariate shift. Conventional approaches like domain adaptation requires the pre-collected target data for iterative training, which is impractical in real-world applications. In this paper, we propose to adapt the deep models to the novel environment during inference. An previous solution is test time normalization, which substitutes the source statistics in BN layers with the target batch statistics. However, we show that test time normalization may potentially deteriorate the discriminative structures due to the mismatch between target batch statistics and source parameters. To this end, we present a general formulation α-BN to calibrate the batch statistics by mixing up the source and target statistics for both alleviating the domain shift and preserving the discriminative structures. Based on α-BN, we further present a novel loss function to form a unified test time adaptation framework Core, which performs the pairwise class correlation online optimization. Extensive experiments show that our approaches achieve the state-of-the-art performance on total twelve datasets from three topics, including model robustness to corruptions, domain generalization on image classification and semantic segmentation. Particularly, our α-BN improves 28.4% to 43.9% on GTA5 Cityscapes without any training, even outperforms the latest source-free domain adaptation method. Deep neural networks (DNNs) achieve impressive success across various applications, but heavily rely on the independent and identical distribution (i.i.d.) assumption. However, in real-world applications, the model is prone to encounter the novel instances. For examples, an automatic pilot should have robust performance under different weather conditions.

artificial intelligence, machine learning, statistics, (18 more...)

arXiv.org Artificial Intelligence

2110.04065

Country: Asia > China (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

BYOL works even without batch statistics

Richemond, Pierre H., Grill, Jean-Bastien, Altché, Florent, Tallec, Corentin, Strub, Florian, Brock, Andrew, Smith, Samuel, De, Soham, Pascanu, Razvan, Piot, Bilal, Valko, Michal

arXiv.org Machine LearningOct-20-2020

Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids collapse to a trivial, constant representation. Thus, it has recently been hypothesized that batch normalization (BN) is critical to prevent collapse in BYOL. Indeed, BN flows gradients across batch elements, and could leak information about negative views in the batch, which could act as an implicit negative (contrastive) term. However, we experimentally show that replacing BN with a batch-independent normalization scheme (namely, a combination of group normalization and weight standardization) achieves performance comparable to vanilla BYOL ($73.9\%$ vs. $74.3\%$ top-1 accuracy under the linear evaluation protocol on ImageNet with ResNet-$50$). Our finding disproves the hypothesis that the use of batch statistics is a crucial ingredient for BYOL to learn useful representations.

artificial intelligence, machine learning, representation, (13 more...)

arXiv.org Machine Learning

2010.10241

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback